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TITLE: ENHANCED WAVEFORMJJ1TERPOLATIVE CODER 



CROSS REFERENCE TO RELATED APPLICATIONS 

This application claims the benefit of Provisional Patent Application 
5 Nos. 60/110,522, filed December 1, 1998 and 60/110,641 filed December 1, 
1998. 

BACKGROUND OF THE INVENTION 

Recently, there has been growing interest in developing toll-quality 

10 speech coders at rates of 4 kbps and below. The speech quality produced by 
waveform coders such as code-excited linear prediction (CELP) coders 
degrades rapidly at rates below 5 kbps [B. S. Atal, and M. R. Schroeder, 
"Stochastic Coding of Speech at Very Low Bit Rate", Proc. Int. Conf. Comm, 
Amsterdam, pp. 1610-1613, 1984]. On the other hand, parametric coders 

is such as the waveform-interpolative (Wl) coder, the sinusoidal-transform 

coder (STC), and the multiband-excitation (MBE) coder produce good quality 
at low rates, but they do not achieve toll quality [Y. Shoham, "High Quality 
Speech Coding at 2.4 to 4.0 kbps Based on Time Frequency-Interpolation", 
IEEE ICASSP'93, Vol. II, pp. 167-170, 1993; W. B. Kleijn, and J. Haagen, 

20 "Waveform Interpolation for Coding and Synthesis", in Speech Coding 

Synthesis by W. B. Kleijn and K. K. Paliwal, Elsevier Science B. V., Chapter 
5, pp. 175-207, 1995; I. S. Burnett, and D. H. Pham, "Multi-Prototype 
Waveform Coding using Frame-by-Frame Analysis-by-Synthesis", IEEE 
ICASSP'97, pp. 1567-1570, 1997; R. J. McAulay, and T. F. Quatieri, 

25 "Sinusoidal Coding", in Speech Coding Synthesis by W. B. Kleijn and K. K. 
Paliwal, Elsevier Science B. V., Chapter 4, pp. 121-173, 1995; and D. Griffin, 
and J. S. Lim, "Multiband Excitation Vocoder", IEEE Trans. ASSP, Vol. 36, 
No. 8, pp. 1223-1235, August 1988], This is mainly due to lack of robustness 
to parameter estimation, which is commonly done in open loop, and to 

30 inadequate modeling of non-stationary speech segments. Also, in parametric 
coders the phase information is commonly not transmitted, and this is for two 
reasons: first, the phase is of secondary perceptual significance; and second, 
no efficient phase quantization scheme is known. Wl coders typically use a 
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fixed phase vector for the slowly evolving waveform [Shoham, supra; Kleijn et 
al, supra] and Burnett et al, supra]. For example, in Kleijn et al r a fixed male 
speaker extracted phase was used. On the other hand, waveform coders 
such as CELP, by directly quantizing the waveform, implicitly allocate an 
5 excessive number of bits to the phase information - more than is perceptually 
required. 

SUMMARY OF THE INVENTION 

The present invention overcomes the foregoing drawbacks by 

io implementing a paradigm that incorporates analysis-by-synthesis (AbS) for 
parameter estimation, and a novel pitch search technique that is well suited 
for the non-stationary segments. In one embodiment, the invention provides 
a novel, efficient AbS vector quantization (VQ) encoding of the dispersion 
phase of the excitation signal to enhance the performance of the waveform 

is interpolate (Wl) coder at a very low bit-rate, which can be used for 

parametric coders as well as for waveform coders. The enhanced analysis- 
by-synthesis waveform interpolative (EWI) coder of this invention employs 
this scheme, which incorporates perceptual weighting and does not require 
any phase unwrapping. 

20 The Wl coders use non-ideal low-pass filters for downsampling and 

upsampling of the slowly evolving waveform (SEW). In another embodiment 
of the invention, A novel AbS SEW quantization scheme is provided, which 
takes the non-ideal filters into consideration. An improved match between 
reconstructed and original SEW is obtained, most notably in the transitions. 

25 Pitch accuracy is crucial for high quality reproduced speech in Wl 

coders. Still another embodiment of the invention provides a novel pitch 
search technique based on varying segment boundaries; it allows for locking 
onto the most probable pitch period during transitions or other segments with 
rapidly varying pitch. 

30 Commonly in speech coding, the gain sequence is downsampled and 

interpolated. As a result it is often smeared during plosives and onsets. To 
alleviate this problem, a further embodiment of the invention provides a novel 
switched-predictive AbS gain VQ scheme based on temporal weighting. 
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More particularly, the invention provides a method for interpolative 
coding of input signals at low data rates in which there may be significant 
pitch transitivity, the signals having an evolving waveform, the method 
incorporating at least one, and preferably all, of the following steps: 
5 (a) AbS VQ of the SEW whereby to reduce distortion in the signal by 

obtaining the accumulated weighted distortion between an original sequence 
of waveforms and a sequence of quantized and interpolated waveforms; 

(b) AbS quantization of the dispersion phase; 

(c) locking onto the most probable pitch period of the signal using both 
10 a spectral domain pitch search and a temporal domain pitch search; 

(d) incorporating temporal weighting in the AbS VQ of the signal gain, 
whereby to emphasize local high energy events in the input signal; 

(e) applying both high correlation and low correlation synthesis filters to 
a vector quantizer codebook in the AbS VQ of the signal gain whereby to add 

15 self correlation to the codebook vectors and maximize similarity between the 
signal waveform and a codebook waveform; 

(f) using each value of gain in the AbS VQ of the signal gain to obtain a 
plurality of shapes, each composed of a predetermined number of values, 
and comparing said shapes to a vector quantized codebook of shapes, each 

20 having said predetermined number of values, e.g., in the range of 2 - 50, 
preferably 5 - 20; and 

(g) using a coder in which a plurality of bits, e.g. 4 bits, are allocated to 
the SEW dispersion phase. 

The method of the invention can be used in general with any waveform 
25 signal, and is particularly useful with speech signals. In the step of AbS VQ of 
the SEW, distortion is reduced in the signal by obtaining the accumulated 
weighted distortion between an original sequence of waveforms and a 
sequence of quantized and interpolated waveforms. In the step of AbS 
quantization of the dispersion phase, at least one codebook is provided that 
30 contains magnitude and phase information for predetermined waveforms. 
The linear phase of the input is crudely aligned, then iteratively shifted and 
compared to a plurality of waveforms reconstructed from the magnitude and 
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phase information contained in one or more codebooks. The reconstructed 
waveform that best matches one of the iteratively shifted inputs is selected. 

In the step of locking onto the most probable pitch period of the signal, the 
invention includes searching the temporal domain pitch, defining a boundary 
5 for a segment of said temporal domain pitch, maximizing the length of the 
boundary by iteratively shrinking and expanding the segment, and maximizing 
the similarity by shifting the segment. The searches are preferably conducted 
respectively at 100 Hz and 500 Hz. 

l o BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of the AbS SEW vector quantization; 

Figure 2 shows amplitude-time plots illustrating the improved waveform 
is matching obtained for a non-stationary speech segment by interpolating the 
optimized SEW; 

Figure 3 is a block diagram of the AbS dispersion phase vector quantization; 

20 Figure Ajs a plot of the segmentally weighted signal-to-noise ratio of the 
phase vector quantization versus the number of bits, for modified 
intermediate reference system (MIRS) and for non-MIRS (flat) speech; 

Figurej5s>hows the results of subjective A/B tests comparing a 4-bit phase 
25 vector quantization and a male extracted fixed phase; 

Figure 6 is a block diagram of the pitch search of the EWI coder; and 

Figure 7 is a block diagram of the switch-predictive AbS gain VQ using 
30 temporal weighting. 



DETAILED DESCRIPTION OF THE INVENTION 
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The invention has a number of embodiments, some of which can be 
used independently of the others to enhance speech and other signal coding 
systems. The embodiments cooperate to produce a superior coding system, 
involving AbS SEW optimization, and novel dispersion phase quantizer, pitch 
5 search scheme, switched-predictive AbS gain VQ, and bit allocation. 



10 



15 



20 



25 



AbS SEW Quantization 

Commonly in Wl coders the SEW is distorted by downsampling and 
upsampiing with non-ideal low-pass filters. In order to reduce such distortion, 
an AbS SEW quantization scheme, illustrated in Figure 1, was used. 
Consider the accumulated weighted distortion, D wil between the input SEW 
vectors, r m , and the interpolated vectors, r m , given by: 



wl M 



m=\ 



(1) 



where the first sum is that of many current distortions and the second sum is 
that of lookahead distortions. H denotes Hermitian (transposed + complex 
conjugate), M is the number of waveforms per frame, L is the lookahead 
number of waveforms, a(t) is some increasing interpolation function in the 
range O^aCfJ^I , and W m is a diagonal matrix whose elements, Wkh, 

are the combined spectral-weighting and synthesis of the k-th harmonic given 
by: 



1 



kk K 



w. . = 



A(z)A(z/r 2 ) 



;/c=1,..,K 



(2) 



where P is the pitch period, K is the number of harmonics, g is the gain, A(z) 
and a(z) are the input and the quantized LPC polynomials respectively, and 
the spectral weighting parameters satisfy 0 < yz < yi ^ 1 - It is also possible to 
leave out the inverse of the number of harmonics, i.e., the 1/K parameter, the 
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gain, i.e. the g parameter, or another combination of input and quantized 
LPC polynomials, i.e. the a(Z) and 2(Z) parameters. 

The interpolated SEW vectors are given by: 
r m =V-a(t m )]r 0 +a(t m )r M \ m=f,. M M (3) 

> where t is time, m is the number of waveforms in a frame, and f 0 and r M are 
the quantized SEW at the previous and at the current frame respectively. 
The parameter a is an increasing linear function from 0 to 1 . It can be shown 
that the accumulated distortion in equation (1) is equal to the sum of 
modeling distortion and quantization distortion: 



10 



15 



where the quantization distortion is given by: 

D (r r 1 = (r - r W <r -r ) (5) 

wK M M,opt } K M r M,opt } M,opt K M r M,opt J 

The optimal vector, ri ^ , which minimizes the modeling distortion, is given 

M,opt 

by: 



r = W 1 

M 9 opt M,opt 



M 

M+L-X 2 
_ m=M+\ 



(6) 



where, w = ¥ a(t ) 2 W ^ M f~ l \\~alt )] 2 W < 7 > 

20 Therefore, VQ with the accumulated distortion of equation (1) can be 

simplified by using the distortion of equation (5), and: 

r % , =argmin| (r\-r., J^W,. ,(r ! .-i\, J (6) 
M S { v / M,opt J M,opr i Mjopr ) 

i 



An improved match between reconstructed and original SEW is 
obtained, most notably in the transitions. Figure 2 illustrates the improved 
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waveform matching obtained for a non-stationary speech segment by 
interpolating the optimized SEW. 

AbS Phase Quantization 

5 The dispersion-phase vector quantization scheme is illustrated in 

Figure 3. Consider a pitch cycle which is extracted from the residual signal, 
and is cyclically shifted such that its pulse is located at position zero. Let its 
discrete Fourier transform (DFT) be denoted by r; the resulting DFT phase is 
the dispersion phase, q> , which determines, along with the magnitude [rj, the 

10 waveform's pulse shape. The SEW waveform r is the vector of complex DFT 
coefficients. The complex number can represent magnitude and phase. After 
quantization, the components of the quantized magnitude vector, |fj, are 

multiplied by the exponential of the quantized phases, <pW , to yield the 

quantized waveform DFT, f , which is subtracted from the input DFT to 
is produce the error DFT. The error DFT is then transformed to the perceptual 
domain by weighting it by the combined synthesis and weighting filter 
W(z)/A(z). In a crude linear phase alignment, the encoder searches for the 
phase that minimizes the energy of the perceptual domain error, shifting the 
signal such that the peak is located at time zero. It then allows a refining 
20 cyclic shift of the input waveform during the search, incrementally increasing 
or decreasing the linear phase, to eliminate any residual phase shift between 
the input waveform and the quantized waveform. Although shown in Figure 3 
as occurring immediately after the crude linear phase alignment, the refined 
linear phase alignment step can occur elsewhere in the cycle, e.g., between 
25 the X and + steps. Phase dispersion quantization aims to improve waveform 
matching. Efficient quantization can be obtained by using the perceptually 
weighted distortion: 

D w (r,r) = {r-i) H W(r-r) (7) 

The magnitude is perceptually more significant than the phase; and 
30 should therefore be quantized first. Furthermore, if the phase were quantized 
first, the very limited bit allocation available for the phase would lead to an 
excessively degraded spectral matching of the magnitude in favor of a 
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somewhat improved, but less important, matching of the waveform. For the 
above distortion, the quantized phase vector is given by: 

^ = argmin{(r-e^|f|f W(r-e^|fj) } (8) 

where / is the running phase codebook index, and e" 7 ^ is the respective 
diagonal phase exponent matrix where /' is the running phase codebook 
index, and the respective phase exponent matrix is given by 

e = diagonal! e 1 >. (9) 

The AbS search for phase quantization Is based on evaluating (8) for each 
candidate phase codevector. Since only trigonometric functions of the phase 
candidates are used, phase unwrapping is avoided. The EWI coder uses the 

optimized SEW, P , and the optimized weighting, w,., , for the AbS 
Jyi,opt M,opt 

phase quantization. 

[2x 

Equation (8) = arg max \ jr ty)f w (&. , <j>)d<j> 
Equivalent^, the quantized phase vector can be simplified to: 



<p = argmax{§ w |r(*)||r(*)|cos(^)-^(it).)[ (10) 
(p. \.k=\ KK 1 J 



where is the phase of, r(k), the k-Xh input DFT coefficient. The average 
global distortion measure for M vector set is: 

D = — y D (r J® m IpI ^ 

w,Global M f- u wVm>* \ r \m-> 
m m={Data 

Vectors} (11) 

M m ={DataK m & M*" 1 m ' ^ 

Vectors) 
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The centroid equation [A. Gersho et al, "Vector Quantization and 
Signal Compression", Kluwer Academic Publishers, 1992] of the k-th 
harmonic's phase for the j-th cluster, which minimizes the global distortion in 
equation (11), is given by: 

V J f -cluster 



= atan 



These centroid equations use trigonometric functions of the phase, and 
therefore do not require any phase unwarping. It is possible to use jr(£) w j 2 

instead of \r(k) m \\r(k) m \ 

The phase vector's dimension depends on the pitch period and, 
10 therefore, a variable dimension VQ has been implemented. In the Wl system 
the possible pitch period value was divided into eight ranges, and for each 
range of pitch period an optimal codebook was designed such that vectors of 
dimension smaller than the largest pitch period in each range are zero 
padded. 

15 Pitch changes over time cause the quantizer to switch among the 

pitch-range codebooks. In order to achieve smooth phase variations 
whenever such switch occurs, overlapped training clusters were used. 

The phase-quantization scheme has been implemented as a part of 
Wl coder, and used to quantize the SEW phase. The objective performance 
20 of the suggested phase VQ has been tested under the following conditions: 

• Phase Bits: 0-6 every 20ms, a bitrate of 0-300 bit/second. 

• 8 pitch ranges were selected, and training has been performed for each 
range. 

• Modified IRS (MIRS) filtered speech (Female+Male) 
25 • Training Set: 99,323 vectors. 

• Test Set: 83,099 vectors. 

• Non-MIRS filtered speech (Female+Male) 
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• Training Set: 101 ,359 vectors. 

• Test Set: 95,446 vectors. 

• The magnitude was not quantized. 

The segmental weighted signal-to-noise ratio (SNR) of the quantizer is 
illustrated in Figure 4. The proposed system achieves approximately 14dB 
SNR for as low as 6 bits for non-MIRS filtered speech, and nearly 10dB for 
MIRS filtered speech. 

Recent Wl coders have used a male speaker extracted dispersion 
phase [Kleijn et al, supra; Y. Shoham, "Very Low Complexity Interpolate 
Speech Coding at 1.2 to 2.4 KBPS", IEEE ICASSP '97, pp. 1599-1602, 
1997].A subjective A/B test was conducted to compare the dispersion phase 
of this invention, using only 4 bits, to a male extracted dispersion phase. The 
test data included 16 MIRS speech sentences, 8 of which are of female 
speakers, and 8 of male speakers. During the test, all pairs of file were 
played twice in alternating order, and the listeners could vote for either of the 
systems, or for no preference. The speech material was synthesized using 
Wl system in which only the dispersion phase was quantized every 20ms. 
Twenty one listeners participated in the test. The test results, illustrated in 
Figure 5, show improvement in speech quality by using the 4-bit phase VQ. 
The improvement is larger for female speakers than for male. This may be 
explained by a higher number of bits per vector sample for female, by less 
spectral masking for female's speech, and by a larger amount of phase- 
dispersion variation for female. The codebook design for the dispersion- 
phase quantization involves a tradeoff between robustness in terms of 
smooth phase variations and waveform matching. Locally optimized 
codebook for each pitch value may improve the waveform matching on the 
average, but may occasionally yield abrupt and excessive changes which 
may cause temporal artifacts. 

Pitch Search 

The pitch search of the EWI coder consists of a spectral domain 
search employed at 1 00 Hz and a temporal domain search employed at 500 
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10 



15 



20 



Hz, as illustrated in Figure 6. The spectral domain pitch search is based on 
harmonic matching [McAuley et al, supra; Griffin et al, supra; and E. Shlomot, 
V. Cuperman, and A. Gersho, "Hybrid Coding of Speech at 4 kbps", IEEE 
Speech Coding Workshop, pp. 37-38, 1997]. The temporal domain pitch 
search is based on varying segment boundaries. It allows for locking onto the 
most probable pitch period even during transitions or other segments with 
rapidly varying pitch (e.g., speech onset or offset or fast changing periodicity). 
Initially, pitch periods, P{ni) t are searched every 2 ms at instances n, by 
maximizing the normalized correlation of the weighted speech s w (n), that is: 



P(n.)= argmaxl pin r 9 N 1? iV ) \ = 

* <r AT AT V * 1 Z ' 



argmax< 

z,N it N 2 



iTtj+T+N^A 

1 n=n--MA 



1 



* l 



(12) 



where x is the shift in the segment, A is some incremental segment used in 
the summations for computational simplicity, and 0< /Vy<|_160/Aj. Then, 
every 10 ms a weighted-mean pitch value is calculated by: 



i=l 1 1 i=\ 1 



(13) 



where p ^ is the normalized correlation for P(nj). The above values (160, 

10, 5) are for the particular coder and is used for illustration. 
Equation (12) describes the temporal domain pitch search and the temporal 
domain pitch refinement blocks of Figure 6. Equation (13) describes the 
weighted average pitch block of Figure 6. 



Gain Quantization 

The gain trajectory is commonly smeared during plosives and onsets 
25 by downsampling and interpolation. This problem is addressed and speech 
crispness is improved in accordance with an embodiment of the invention 
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that provides a novel switched-predictive AbS gain VQ technique, illustrated 
in Figure 7. Switched-prediction is introduced to allow for different levels of 
gain correlation, and to reduce the occurrence of gain outliers. In order to 
improve speech crispness, especially for plosives and onsets, temporal 

5 weighting is incorporated in the AbS gain VQ. The weighting is a monotonic 
function of the temporal gain. Two codebooks of 32 vectors each are used. 
Each codebook has an associated predictor coefficient, P,, and a DC offset 
D,\ The quantization target vector is the DC removed log-gain vector denoted 
by t(m). The search for the minimal weighted mean squared error (WMSE) is 

10 performed over all the vectors, c#(m), of the codebooks. The quantized target, 
i(m) , is obtained by passing the quantized vector, Cy<m) T through the 
synthesis filter. Since each quantized target vector may have a different value 
of the removed DC, the quantized DC is added temporarily to the filter 
memory after the state update, and the next quantized vector's DC is 

is subtracted from it before filtering is performed. Since the predictor 

coefficients are known, direct VQ can be used to simplify the computations. 
The synthesis filter adds self correlation to the codebook vector. All 
combinations are tried and whether high or low self correlation is used 
depends on which yields the best results. 

20 

Bit Allocation 

The bit allocation of the coder is given in Table 1 . The frame length is 
20 ms, and ten waveforms are extracted per frame. The pitch and the gain 
are coded twice per frame. 

25 

Table 1- Bit allocation for EWI coder 



Parameter 


Bits / Frame 


Bits / second 


LPC 


18 


900 


Pitch 


2x6=12 


600 


Gain 


2x6=12 


600 


REW 


20 


1000 


SEW magn. 


14 


700 


SEW phase 


4 


200 
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Subjective Results 

5 A subjective A/B test was conducted to compare the 4 kbps EWI coder 

of this invention to MPEG-4 at 4 kbps, and to G.723.1 . The test data included 
24 MIRS speech sentences, 12 of which are of female speakers, and 12 of 
male speakers. Fourteen listeners participated in the test. The test results, 
listed in Tables 2 to 4, indicate that the subjective quality of EWI exceeds that 

10 of MPEG-4 at 4 kbps and of G.723.1 at 5.3 kbps, and it is slightly better than 
that of G.723.1 at 6.3 kbps. 

Table 2 

Test 4 kbps Wl 4 kbps MPEG-4 

15 Female 65.48% 34.52% 

Male 61.90% 38.10% 

Total 63.69% 36.31% 

Table 2 shows the results of subjective A/B tests for comparison between 
20 the 4 kbps Wl coder and th 4 kbps MPEG-4. With 95% certainty the Wl 
preference lies in [58.63%, 68.75%]. 



Table 3 

25 Test 4 kbps Wl 5.3 kbps G.723.1 

Female 57.74% 42.26% 

Male 61.31% 38.69% 

Total 59.52% 40.48% 

30 Table 3 shows the results of subjective A/B tests for comparison between 
the 4 kbps Wl coder to 5.3 kbps G.723.1 . With 95% certainty the Wl 
preference lies in [54.17%, 64.88%] 
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Table 4 



Test 



4 kbps Wl 6.3 kbps G.723. 1 
54.76% 45.24% 



Female 



5 



Male 



52.98% 



47.02% 



Total 



53.87% 



46.13% 



Table 4. Results of subjective A/B test for comparison between the 4 kbps 
Wl coder to 6.3 kbps G.723. 1. With 95% certainty the Wl preference lies in 
10 [48.51%, 59.23%]. 

The present invention incorporates several new techniques that 
enhance the performance of the Wl coder, analysis-by-synthesis vector- 
quantization of the dispersion-phase, AbS optimization of the SEW, a special 
is pitch search for transitions, and switched-predictive analysis-by-synthesis 
gain VQ. These features improve the algorithm and its robustness. The test 
results indicate that the performance of the EWI coder slightly exceeds that 
of G.723. 1 at 6.3 kbps and therefore EWI achieves very close to toll quality, 
at least under clean speech conditions. 



20 



25 
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THE CLAIMS 

1 . A method for interpolate coding input signals at low data rates in 
which there is significant pitch transitivity, and wherein said signals said 
signals may have a slowly evolving waveform, the method incorporating at 
least one of the following steps: 

(a) analysis-by-synthesis vector-quantization of the slowly evolving 
waveform; 

(b) analysis-by-synthesis quantization of the dispersion phase; 

(c) locking onto the most probable pitch period of the signal using both 
a spectral domain pitch search and a temporal domain pitch search; 

(d) incorporating temporal weighting in the anaiysis-by-synthesis 
vector-quantization of the signal gain; 

(e) applying both high correlation and low correlation synthesis filters to 
a vector quantizer codebook in the analysis-by-synthesis vector- 
quantization of the signal gain whereby to add self correlation to the 
codebook vectors; 

(f) using each value of gain in the analysis-by-synthesis vector- 
quantization of the signal gain; and 

(g) using a coder in which a plurality of bits therein are allocated to the 
slowly evolving waveform phase. 

2. The method of claim 1 in which said signal is speech. 

3. The method of claim 1 in which said method incorporates each of 
steps (a) through (g). 

4. The method of claim 1 in which in the step of analysis-by-synthesis 
vector-quantization of the slowly evolving waveform, distortion is reduced 
in the signal by obtaining the accumulated weighted distortion between an 
original sequence of waveforms and a sequence of quantized and 
interpolated waveforms. 

5. The method of claim 1 including providing at least one codebook 
containing magnitude and phase information for predetermined 
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waveforms, and in which the step of analysis-by-synthesis quantization of 
the dispersion phase is conducted by crudely aligning the linear phase of 
the input, then iteratively shifting said crudely aligned linear phase input, 
comparing the shifted input to a plurality of waveforms reconstructed from 
the magnitude and phase information contained in said at least one 
codebook, and selecting the reconstructed waveform that best matches 
one of the iteratively shifted inputs. 

6. The method of claim 1 in which in the method of searching the 
temporal domain pitch in said step of locking onto the most probable pitch 
period of the signal, comprises defining a boundary for a segment of said 
temporal domain pitch, selecting the best boundary and maximizing the 
similarity by iteratively shifting the segment, and by shrinking and 
expanding the segment, 

7. The method of claim 1 in which the spectral domain pitch and temporal 
domain pitch searches, in said step of locking onto the most probable 
pitch period of the signals, are conducted respectively at 100 Hz and 500 
Hz. 

8. The method of claim 1 in which the step of the temporal weighting in 
the analysis-by-synthesis vector-quantization of the signal gain is changed 
as a function of time whereby to emphasize local high energy events in 
the input signal. 

9. The method of claim 1 in which selection between the high and low 
correlation synthesis filters in the anaiysis-by-synthesis vector-quantization 
of the signal gain is made to maximize similarity between the gain 
waveform and a codebook waveform. 

10. The method of claim 1 wherein each value of gain in the analysis-by- 
synthesis vector-quantization of the signal gain is used to obtain a plurality 
of shapes, each composed of a predetermined number of values, and 
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comparing said shapes to a vector quantized codebook of shapes, each 
having said predetermined number of values. 

11. A method for interpolate coding input signals at low data rates in 
which said signals have a slowly evolving waveform, the method 
incorporating analysis-by-synthesis vector-quantization of the slowly 
evolving waveform. 

12. The method of claim 1 1 in which distortion is reduced in the signal by 
obtaining the accumulated weighted distortion between an original 
sequence of waveforms and a sequence of quantized and interpolated 
waveforms. 



13. A method for interpolative coding input signals at low data speeds in 
which the signal has a slowly evolving waveform having a dispersion 
phase, the method incorporating analysis-by-synthesis quantization of the 
dispersion phase. 



14. The method of claim 13 including providing at least one codebook 
containing magnitude and phase information for predetermined 
waveforms, crudely aligning the linear phase of the input, then iteratively 
shifting said crudely aligned linear phase input, comparing the shifted 
input to a plurality of waveforms reconstructed from the magnitude and 
phase information contained in said at least one codebook, and selecting 
the reconstructed waveform that best matches one of the iteratively shifted 
inputs. 



15. The method of claim 14 in which the average global distortion measure 
for a particular vector set M is: 



1 K m 



Vectors} 



r(k) m -e 
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and including the step of minimizing the global distortion thereof by using 
the following formula for the k-th harmonic's phase for the j-th cluster: 



= atan 



'th-cluster 



1 2 

£ ir- w kkm\ r( - k) ™\ cos ^)/») 

m={jth-cluster) K m KK ' m 



16. The method of claim 14 in which the average global distortion measure 
for a particular vector set M is: 

2 



— i 



1 K m 



Vectors) 



r(k) m -e 



m 



Mm 



and including the step of minimizing the global distortion thereof by using 
the following formula for the k-th harmonic's phase for the j-th cluster: 



f n -cluster 



atan 



m={jth-cluster}K m **» m 

_m={jth-cluster}K m KK ^ 



17. A method for interpolative coding input signals at low data rates, 
comprising locking onto the most probable pitch period of the signal using 
both a spectral domain pitch search and a temporal domain pitch search. 

18. The method of claim 17 in which in the method of searching the 
temporal domain pitch comprises defining a boundary for a segment of 
said temporal domain pitch, selecting the location of the boundaries that 
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maximize the similarity by iteratively shrinking and expanding the segment 
and by shifting the segment. 

19. The method of claim 18 in which the method of searching the temporal 
domain pitch is in accordance with the formula: 



PM= argmax j p(n.,r,N N ) } = 

1 AT AT v 1 12' 



t,N v N 2 



argmaxs 



r,N v N 2 



nA-r+N^A 



n^r+N^A 
n-n.-N^A 



n.+r+N^A 



where x is the shift in the segment, A is some incremental segment used 
in the summations for computational simplicity, and Nj is a number 
calculated for the coder. 



20. The method of claim 19 including the step of obtaining the weighted 
average pitch in accordance with the formula: 



P mean -^fi^Pin^l^pi^) 

where />(«,) is the normalized correlation for P(n/). 
21 . The method of claim 19 in which the spectral domain pitch and 
temporal domain pitch searches in said step of locking onto the most 
probable pitch period of the signals are conducted respectively at 100 Hz 
and 500 Hz. 
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22. A method for interpolate coding input signals at low data speeds, 
comprising incorporating temporal weighting in the analysis-by-synthesis 
vector-quantization of the signal gain. 

23. The method of claim 22 in which the temporal weighting is changed as 
a function of time whereby to emphasize local high energy events in the 
input signal. 

24. A method for interpolate coding input signals at low data speeds, 
comprising applying both high correlation and low correlation synthesis 
filters to a vector quantizer codebook in the analysis-by-synthesis vector- 
quantization of the signal gain whereby to add self correlation to the 
codebook vectors. 

25. The method of claim 24 in which selection between the high and low 
correlation synthesis filters is made to maximize similarity between the 
signal waveform and a codebook waveform. 

26. A method for interpolate coding input signals at low data speeds, 
comprising using each value of gain in the analysis-by-synthesis vector- 
quantization of the signal gain. 

27. The method of claim 26 wherein each value of gain is used to obtain a 
plurality of shapes, each composed of a predetermined number of values, 
and comparing said shapes to a vector quantized codebook of shapes, 
each having said predetermined number of values. 

28. The method of claim 27 in which said predetermined number of values 
is in the range of 2 to 50. 

29. The method of claim 28 in which said predetermined number of values 
is in the range of 5 to 20. 



30. A method for interpolative coding input signals at low data speeds in 
which said signals have a slowly evolving waveform, comprising using a 
coder in which a plurality of bits therein are allocated to the slowly evolving 
waveform phase. 



31 . The method of claim 30 in which 4 bits are allocated to the slowly 
evolving waveform phase in the coder. 



WO 00/33297 



PCT/US99/28449 



1/4 




<3 
I 



CM 



-f- 4- O 

O ZS1 w 
O r-« cu co 

=^ X ^ o 

LU ^ LU 



o 



3 

CO 
UJ 











T 


T 


t 






I 


5£ 



















ST 






rvi 



CO 

t— H 

' — ' <c 



o 



CO 



o E3 
x 



5 S 2§o 





KS 


or: 


o 


o 

UJ 


BO 


1 






[CO 



SUBSTITUTE SHEET (RULE 26) 



WO 00/33297 



09/88 IS 

PCT/US99/28449 



FIG. 2 



FAS. J 



x1(T 



4/4 





1.0 - 


LaJ 


0.5 - 


ZD 
t— i 


0.0 - 


AMPL 


-0.5 - 



ORIGINAL 



-1.0 




0.70 0.71 0.72 
OPTIMIZED 



x10' 



T i r 

0.69 0.70 0.71 0.72 
NON-OPTIMIZED 

ORIGINAL 



i 1 r 

0.69 0.70 0.71 0.72 
TIME(SEC) 



PITCH-CYCLE 
WAVEFORM'S DFT 



CRUDE 
LINEAR- 
PHASE 
ALIGNMENT 



REFINED 
LINEAR- 
PHASE 
ALIGNMENT 



0.73 




0.73 




0.73 







PHASE 
C0DEB00K 





PITCH 1 




MIN1 1*1 1' 



SUBSTITUTE SHEET (RULE 26) 




1 1 i 1 1 r 

0 1 2 3 4 5 

PHASE BITS 



FIG. 5 




E3 4BIT VQ 
W MALE EXTRACTED 
□ NO PREFERENCE 




FEMALE 



MALE 



SUBSTITUTE SHEET (RULE 26) 



WO 00/33297 



tP) f 0"'. -/ ^> 



PCT/US99/28449 



SPEECH 



4/4 



SPECTRAL DOMAIN 
PITCH SEARCH+TRACKER 



100Hz 




YES 



TEMPORAL DOMAIN 
PITCH REFINEMENT 



500Hz 



TEMPORAL DOMAIN 
PITCH SEARCH 




N0 / GOOD 
PITCHES 

9 



NO 

USE 4ms 
WAVEFORM 
LENGTH 
I 



YES 



YES 



500Hz 









WEIGHTED-AVERAGE 
PITCH 





FIG. 6 



100Hz 



FIG. 7 



LOG-GAIN 



g(m) 



DC 

CODEBOOK 


D i 




P i 


PREDICTOR 
CODEBOOK 


C..(M) 






t(m) 


VECTOR 
QUANTIZER 
CODEBOOK 


SYNTHESIS 1 . 




FILTER 1-P^" 1 





t(m) 






MIN j |* | 2 




TEMPORAL 
WEIGHTING 









SUBSTITUTE SHEET (RULE 26) 



DECLARATION AND PETITION 

As the below named inventors, I hereby declare that: 
My residences post office address and citizenship are as stated below next 
to my name. 

I believe that I am the original, first inventor of the subject matter which 
is claimed and for which a patent is sought on the invention entitled 
ENHANCED WAVEFORM INTERPOLATIVE CODER, the specification of 
which was filed on December 10, 1999 and was assigned International 
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